{
 "cells": [
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b0902b4b",
   "metadata": {},
   "source": [
    "<a href=\"https://colab.research.google.com/github/run-llama/llama_index/blob/main/docs/examples/vector_stores/Timescalevector.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "db0855d0",
   "metadata": {},
   "source": [
    "# Timescale Vector Store (PostgreSQL)\n",
    "\n",
    "This notebook shows how to use the Postgres vector store `TimescaleVector` to store and query vector embeddings.\n",
    "\n",
    "## What is Timescale Vector?\n",
    "**[Timescale Vector](https://www.timescale.com/ai) is PostgreSQL++ for AI applications.**\n",
    "\n",
    "Timescale Vector enables you to efficiently store and query millions of vector embeddings in `PostgreSQL`.\n",
    "- Enhances `pgvector` with faster and more accurate similarity search on millions of vectors via DiskANN inspired indexing algorithm.\n",
    "- Enables fast time-based vector search via automatic time-based partitioning and indexing.\n",
    "- Provides a familiar SQL interface for querying vector embeddings and relational data.\n",
    "\n",
    "Timescale Vector scales with you from POC to production:\n",
    "- Simplifies operations by enabling you to store relational metadata, vector embeddings, and time-series data in a single database.\n",
    "- Benefits from rock-solid PostgreSQL foundation with enterprise-grade feature liked streaming backups and replication, high-availability and row-level security.\n",
    "- Enables a worry-free experience with enterprise-grade security and compliance.\n",
    "\n",
    "## How to use Timescale Vector\n",
    "Timescale Vector is available on [Timescale](https://www.timescale.com/ai), the cloud PostgreSQL platform. (There is no self-hosted version at this time.)\n",
    "\n",
    "**LlamaIndex users get a 90-day free trial for Timescale Vector.**\n",
    "- To get started, [signup](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) to Timescale, create a new database and follow this notebook!\n",
    "- See the [Timescale Vector explainer blog](https://www.timescale.com/blog/how-we-made-postgresql-the-best-vector-database/?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) for details and performance benchmarks.\n",
    "- See the [installation instructions](https://github.com/timescale/python-vector) for more details on using Timescale Vector in python."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "cfb491c6",
   "metadata": {},
   "source": [
    "## 0. Setup\n",
    "Let's import everything we'll need for this notebook."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "1e6e6483",
   "metadata": {},
   "source": [
    "If you're opening this Notebook on colab, you will probably need to install LlamaIndex 🦙."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5807eefe",
   "metadata": {},
   "outputs": [],
   "source": [
    "!pip install llama-index"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c2d1c538",
   "metadata": {},
   "outputs": [],
   "source": [
    "# import logging\n",
    "# import sys\n",
    "\n",
    "# Uncomment to see debug logs\n",
    "# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)\n",
    "# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))\n",
    "\n",
    "import timescale_vector\n",
    "from llama_index import SimpleDirectoryReader, StorageContext\n",
    "from llama_index.indices.vector_store import VectorStoreIndex\n",
    "from llama_index.vector_stores import TimescaleVectorStore\n",
    "from llama_index.vector_stores.types import VectorStoreQuery, MetadataFilters\n",
    "import textwrap\n",
    "import openai"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "26c71b6d",
   "metadata": {},
   "source": [
    "### Setup OpenAI API Key\n",
    "To create embeddings for documents loaded into the index, let's configure your OpenAI API key:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "67b86621",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get openAI api key by reading local .env file\n",
    "# The .env file should contain a line starting with `OPENAI_API_KEY=sk-`\n",
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "\n",
    "_ = load_dotenv(find_dotenv())\n",
    "\n",
    "# OR set it explicitly\n",
    "# import os\n",
    "# os.environ[\"OPENAI_API_KEY\"] = \"<your key>\"\n",
    "openai.api_key = os.environ[\"OPENAI_API_KEY\"]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7bd24f0a",
   "metadata": {},
   "source": [
    "### Create a PostgreSQL database and get a Timescale service URL\n",
    "You need a service url to connect to your Timescale database instance.\n",
    "\n",
    "First, launch a new cloud database in [Timescale](https://console.cloud.timescale.com/signup?utm_campaign=vectorlaunch&utm_source=llamaindex&utm_medium=referral) (sign up for free using the link above).\n",
    "\n",
    "To connect to your cloud PostgreSQL database, you'll need your service URI, which can be found in the cheatsheet or `.env` file you downloaded after creating a new database. \n",
    "\n",
    "The URI will look something like this: `postgres://tsdbadmin:<password>@<id>.tsdb.cloud.timescale.com:<port>/tsdb?sslmode=require`"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e6d61e73",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get the service url by reading local .env file\n",
    "# The .env file should contain a line starting with `TIMESCALE_SERVICE_URL=postgresql://`\n",
    "import os\n",
    "from dotenv import load_dotenv, find_dotenv\n",
    "\n",
    "_ = load_dotenv(find_dotenv())\n",
    "\n",
    "TIMESCALE_SERVICE_URL = os.environ[\"TIMESCALE_SERVICE_URL\"]\n",
    "\n",
    "# OR set it explicitly\n",
    "# TIMESCALE_SERVICE_URL = \"postgres://tsdbadmin:<password>@<id>.tsdb.cloud.timescale.com:<port>/tsdb?sslmode=require\""
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f7010b1d-d1bb-4f08-9309-a328bb4ea396",
   "metadata": {},
   "source": [
    "## 1. Simple Similarity Search with Timescale Vector"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "38c9e93b",
   "metadata": {},
   "source": [
    "### Download Data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5808d8f1",
   "metadata": {},
   "outputs": [],
   "source": [
    "!mkdir -p 'data/paul_graham/'\n",
    "!wget 'https://raw.githubusercontent.com/run-llama/llama_index/main/docs/examples/data/paul_graham/paul_graham_essay.txt' -O 'data/paul_graham/paul_graham_essay.txt'"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5bf7f4d0",
   "metadata": {},
   "source": [
    "### Loading documents\n",
    "For this example, we'll use a [SimpleDirectoryReader](https://gpt-index.readthedocs.io/en/stable/examples/data_connectors/simple_directory_reader.html) to load the documents stored in the the `paul_graham_essay` directory. \n",
    "\n",
    "The `SimpleDirectoryReader` is one of LlamaIndex's most commonly used data connectors to read one or multiple files from a directory."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "c154dd4b",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Document ID: 740ce1a1-4d95-40cc-b7f7-6d2874620a53\n"
     ]
    }
   ],
   "source": [
    "# load sample data from the data directory using a SimpleDirectoryReader\n",
    "documents = SimpleDirectoryReader(\"./data/paul_graham\").load_data()\n",
    "print(\"Document ID:\", documents[0].doc_id)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "c0232fd1",
   "metadata": {},
   "source": [
    "### Create a VectorStore Index with the TimescaleVectorStore\n",
    "Next, to perform a similarity search, we first create a `TimescaleVector` [vector store](https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/storage/vector_stores.html) to store our vector embeddings from the essay content. TimescaleVectorStore takes a few arguments, namely the `service_url` which we loaded above, along with a `table_name` which we will be the name of the table that the vectors are stored in.\n",
    "\n",
    "Then we create a [Vector Store Index](https://gpt-index.readthedocs.io/en/stable/community/integrations/vector_stores.html#vector-store-index) on the documents backed by Timescale using the previously documents."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8731da62",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a TimescaleVectorStore to store the documents\n",
    "vector_store = TimescaleVectorStore.from_params(\n",
    "    service_url=TIMESCALE_SERVICE_URL,\n",
    "    table_name=\"paul_graham_essay\",\n",
    ")\n",
    "\n",
    "# Create a new VectorStoreIndex using the TimescaleVectorStore\n",
    "storage_context = StorageContext.from_defaults(vector_store=vector_store)\n",
    "index = VectorStoreIndex.from_documents(\n",
    "    documents, storage_context=storage_context\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8ee4473a-094f-4d0a-a825-e1213db07240",
   "metadata": {},
   "source": [
    "### Query the index\n",
    "Now that we've indexed the documents in our VectorStore, we can ask questions about our documents in the index by using the default `query_engine`.\n",
    "\n",
    "Note you can also configure the query engine to configure the top_k most similar results returned, as well as metadata filters to filter the results by. See the [configure standard query setting section](https://gpt-index.readthedocs.io/en/stable/core_modules/data_modules/index/vector_store_guide.html) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0a2bcc07",
   "metadata": {},
   "outputs": [],
   "source": [
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"Did the author work at YC?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8cf55bf7",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Yes, the author did work at YC.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "68cbd239-880e-41a3-98d8-dbb3fab55431",
   "metadata": {},
   "outputs": [],
   "source": [
    "response = query_engine.query(\"What did the author work on before college?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "fdf5287f",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Before college, the author worked on writing and programming. They wrote short stories and also\n",
      "tried programming on the IBM 1401 computer using an early version of Fortran.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "46c602fb",
   "metadata": {},
   "source": [
    "### Querying existing index\n",
    "In the example above, we created a new Timescale Vector vectorstore and index from documents we loaded. Next we'll look at how to query an existing index. All we need is the service URI and the table name we want to access."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "04b30af9",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store = TimescaleVectorStore.from_params(\n",
    "    service_url=TIMESCALE_SERVICE_URL,\n",
    "    table_name=\"paul_graham_essay\",\n",
    ")\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(vector_store=vector_store)\n",
    "query_engine = index.as_query_engine()\n",
    "response = query_engine.query(\"What did the author do before YC?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0ae61855",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Before YC, the author wrote all of YC's internal software in Arc. They also worked on HN and had\n",
      "three projects: writing essays, working on YC, and working in Arc. However, they gradually stopped\n",
      "working on Arc due to time constraints and the increasing dependence on it for infrastructure.\n"
     ]
    }
   ],
   "source": [
    "print(textwrap.fill(str(response), 100))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "b33a8e8b",
   "metadata": {},
   "source": [
    "## 2. Using ANN search indexes to speed up queries\n",
    "\n",
    "(Note: These indexes are ANN indexes, and differ from the index concept in LlamaIndex)\n",
    "\n",
    "You can speed up similarity queries by creating an index on the embedding column. You should only do this once you have ingested a large part of your data.\n",
    "\n",
    "Timescale Vector supports the following indexes:\n",
    "- timescale_vector_index: a disk-ann inspired graph index for fast similarity search (default).\n",
    "- pgvector's HNSW index: a hierarchical navigable small world graph index for fast similarity search.\n",
    "- pgvector's IVFFLAT index: an inverted file index for fast similarity search.\n",
    "\n",
    "Important note: In PostgreSQL, each table can only have one index on a particular column. So if you'd like to test the performance of different index types, you can do so either by (1) creating multiple tables with different indexes, (2) creating multiple vector columns in the same table and creating different indexes on each column, or (3) by dropping and recreating the index on the same column and comparing results."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8264fde3",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Instantiate the TimescaleVectorStore from part 1\n",
    "vector_store = TimescaleVectorStore.from_params(\n",
    "    service_url=TIMESCALE_SERVICE_URL,\n",
    "    table_name=\"paul_graham_essay\",\n",
    ")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "00a4fbee",
   "metadata": {},
   "source": [
    "Using the `create_index()` function without additional arguments will create a `timescale_vector (DiskANN)` index by default, using the default parameters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7b9e60e4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a timescale vector index (DiskANN)\n",
    "vector_store.create_index()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "f6b74702",
   "metadata": {},
   "source": [
    "You can also specify the parameters for the index. See the Timescale Vector documentation for a full discussion of the different parameters and their effects on performance."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "f838181f",
   "metadata": {},
   "outputs": [],
   "source": [
    "# drop old index\n",
    "vector_store.drop_index()\n",
    "\n",
    "# create new timescale vector index (DiskANN) with specified parameters\n",
    "vector_store.create_index(\"tsv\", max_alpha=1.0, num_neighbors=50)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "494826c3",
   "metadata": {},
   "source": [
    "Timescale Vector also supports HNSW and ivfflat indexes:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "66606b76",
   "metadata": {},
   "outputs": [],
   "source": [
    "vector_store.drop_index()\n",
    "\n",
    "# Create an HNSW index\n",
    "# Note: You don't need to specify m and ef_construction parameters as we set smart defaults.\n",
    "vector_store.create_index(\"hnsw\", m=16, ef_construction=64)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "5e3775ff",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create an IVFFLAT index\n",
    "# Note: You don't need to specify num_lists and num_records parameters as we set smart defaults.\n",
    "vector_store.drop_index()\n",
    "vector_store.create_index(\"ivfflat\", num_lists=20, num_records=1000)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "27df7dcd",
   "metadata": {},
   "source": [
    "We recommend using `timescale-vector` or `HNSW` indexes in general."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6d8b987b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# drop the ivfflat index\n",
    "vector_store.drop_index()\n",
    "# Create a timescale vector index (DiskANN)\n",
    "vector_store.create_index()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "9aa5fda2",
   "metadata": {},
   "source": [
    "## 3. Similarity Search with time-based filtering\n",
    "\n",
    "A key use case for Timescale Vector is efficient time-based vector search. Timescale Vector enables this by automatically partitioning vectors (and associated metadata) by time. This allows you to efficiently query vectors by both similarity to a query vector and time.\n",
    "\n",
    "Time-based vector search functionality is helpful for applications like:\n",
    "- Storing and retrieving LLM response history (e.g. chatbots)\n",
    "- Finding the most recent embeddings that are similar to a query vector (e.g recent news).\n",
    "- Constraining similarity search to a relevant time range (e.g asking time-based questions about a knowledge base)\n",
    "\n",
    "To illustrate how to use TimescaleVector's time-based vector search functionality, we'll use the git log history for TimescaleDB as a sample dataset and ask questions about it. Each git commit entry has a timestamp associated with it, as well as natural language message and other metadata (e.g author, commit hash etc). \n",
    "\n",
    "We'll illustrate how to create nodes with a time-based uuid and how run similarity searches with time range filters using the TimescaleVector vectorstore."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "5436fd4b",
   "metadata": {},
   "source": [
    "### Extract content and metadata from git log CSV file\n",
    "\n",
    "First lets load in the git log csv file into a new collection in our PostgreSQL database named `timescale_commits`.\n",
    "\n",
    "Note: Since this is a demo, we will only work with the first 1000 records. In practice, you can load as many records as you want."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "3fe94773",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "from pathlib import Path\n",
    "\n",
    "file_path = Path(\"../data/csv/commit_history.csv\")\n",
    "# Read the CSV file into a DataFrame\n",
    "df = pd.read_csv(file_path)\n",
    "\n",
    "# Light data cleaning on CSV\n",
    "df.dropna(inplace=True)\n",
    "df = df.astype(str)\n",
    "df = df[:1000]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "eb2c9344",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Take a look at the data in the csv (optional)\n",
    "df.head()"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "2b9139b5",
   "metadata": {},
   "source": [
    "We'll define a helper funciton to create a uuid for a node and associated vector embedding based on its timestamp. We'll use this function to create a uuid for each git log entry.\n",
    "\n",
    "Important note: If you are working with documents/nodes and want the current date and time associated with vector for time-based search, you can skip this step. A uuid will be automatically generated when the nodes are added to the table in Timescale Vector by default. In our case, because we want the uuid to be based on the timestamp in the past, we need to create the uuids manually."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2ef55d54",
   "metadata": {},
   "outputs": [],
   "source": [
    "from timescale_vector import client\n",
    "\n",
    "\n",
    "# Function to take in a date string in the past and return a uuid v1\n",
    "def create_uuid(date_string: str):\n",
    "    if date_string is None:\n",
    "        return None\n",
    "    time_format = \"%a %b %d %H:%M:%S %Y %z\"\n",
    "    datetime_obj = datetime.strptime(date_string, time_format)\n",
    "    uuid = client.uuid_from_time(datetime_obj)\n",
    "    return str(uuid)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2d622744",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Helper functions\n",
    "from typing import List, Tuple\n",
    "\n",
    "\n",
    "# Helper function to split name and email given an author string consisting of Name Lastname <email>\n",
    "def split_name(input_string: str) -> Tuple[str, str]:\n",
    "    if input_string is None:\n",
    "        return None, None\n",
    "    start = input_string.find(\"<\")\n",
    "    end = input_string.find(\">\")\n",
    "    name = input_string[:start].strip()\n",
    "    return name\n",
    "\n",
    "\n",
    "from datetime import datetime, timedelta\n",
    "\n",
    "\n",
    "def create_date(input_string: str) -> datetime:\n",
    "    if input_string is None:\n",
    "        return None\n",
    "    # Define a dictionary to map month abbreviations to their numerical equivalents\n",
    "    month_dict = {\n",
    "        \"Jan\": \"01\",\n",
    "        \"Feb\": \"02\",\n",
    "        \"Mar\": \"03\",\n",
    "        \"Apr\": \"04\",\n",
    "        \"May\": \"05\",\n",
    "        \"Jun\": \"06\",\n",
    "        \"Jul\": \"07\",\n",
    "        \"Aug\": \"08\",\n",
    "        \"Sep\": \"09\",\n",
    "        \"Oct\": \"10\",\n",
    "        \"Nov\": \"11\",\n",
    "        \"Dec\": \"12\",\n",
    "    }\n",
    "\n",
    "    # Split the input string into its components\n",
    "    components = input_string.split()\n",
    "    # Extract relevant information\n",
    "    day = components[2]\n",
    "    month = month_dict[components[1]]\n",
    "    year = components[4]\n",
    "    time = components[3]\n",
    "    timezone_offset_minutes = int(\n",
    "        components[5]\n",
    "    )  # Convert the offset to minutes\n",
    "    timezone_hours = timezone_offset_minutes // 60  # Calculate the hours\n",
    "    timezone_minutes = (\n",
    "        timezone_offset_minutes % 60\n",
    "    )  # Calculate the remaining minutes\n",
    "    # Create a formatted string for the timestamptz in PostgreSQL format\n",
    "    timestamp_tz_str = (\n",
    "        f\"{year}-{month}-{day} {time}+{timezone_hours:02}{timezone_minutes:02}\"\n",
    "    )\n",
    "    return timestamp_tz_str"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6fdaf15f",
   "metadata": {},
   "source": [
    "Next, we'll define a function to create a `TextNode` for each git log entry. We'll use the helper function `create_uuid()` we defined above to create a uuid for each node based on its timestampe. And we'll use the helper functions `create_date()` and `split_name()` above to extract relevant metadata from the git log entry and add them to the node."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "626ad668",
   "metadata": {},
   "outputs": [],
   "source": [
    "from llama_index.schema import TextNode, NodeRelationship, RelatedNodeInfo\n",
    "\n",
    "\n",
    "# Create a Node object from a single row of data\n",
    "def create_node(row):\n",
    "    record = row.to_dict()\n",
    "    record_name = split_name(record[\"author\"])\n",
    "    record_content = (\n",
    "        str(record[\"date\"])\n",
    "        + \" \"\n",
    "        + record_name\n",
    "        + \" \"\n",
    "        + str(record[\"change summary\"])\n",
    "        + \" \"\n",
    "        + str(record[\"change details\"])\n",
    "    )\n",
    "    # Can change to TextNode as needed\n",
    "    node = TextNode(\n",
    "        id_=create_uuid(record[\"date\"]),\n",
    "        text=record_content,\n",
    "        metadata={\n",
    "            \"commit\": record[\"commit\"],\n",
    "            \"author\": record_name,\n",
    "            \"date\": create_date(record[\"date\"]),\n",
    "        },\n",
    "    )\n",
    "    return node"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "58450993",
   "metadata": {},
   "outputs": [],
   "source": [
    "nodes = [create_node(row) for _, row in df.iterrows()]"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "83c2d55c",
   "metadata": {},
   "source": [
    "Next we'll create vector embeddings of the content of each node so that we can perform similarity search on the text associated with each node. We'll use the `OpenAIEmbedding` model to create the embeddings."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "0fa62342",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create embeddings for nodes\n",
    "from llama_index.embeddings import OpenAIEmbedding\n",
    "\n",
    "embedding_model = OpenAIEmbedding()\n",
    "\n",
    "for node in nodes:\n",
    "    node_embedding = embedding_model.get_text_embedding(\n",
    "        node.get_content(metadata_mode=\"all\")\n",
    "    )\n",
    "    node.embedding = node_embedding"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "bf1fc37d",
   "metadata": {},
   "source": [
    "Let's examine the first node in our collection to see what it looks like."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "e99fa068",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "commit: 44e41c12ab25e36c202f58e068ced262eadc8d16\n",
      "author: Lakshmi Narayanan Sreethar\n",
      "date: 2023-09-5 21:03:21+0850\n",
      "\n",
      "Tue Sep 5 21:03:21 2023 +0530 Lakshmi Narayanan Sreethar Fix segfault in set_integer_now_func When an invalid function oid is passed to set_integer_now_func, it finds out that the function oid is invalid but before throwing the error, it calls ReleaseSysCache on an invalid tuple causing a segfault. Fixed that by removing the invalid call to ReleaseSysCache.  Fixes #6037\n"
     ]
    }
   ],
   "source": [
    "print(nodes[0].get_content(metadata_mode=\"all\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8b9f26ae",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[-0.005366453900933266, 0.0016374519327655435, 0.005981510039418936, -0.026256779208779335, -0.03944991156458855, 0.026299940422177315, -0.0200558640062809, -0.01252412423491478, -0.04241368919610977, -0.004758591763675213, 0.05639812350273132, 0.006578581873327494, 0.014833281747996807, 0.009509989991784096, 0.0009675443288870156, -0.013157163746654987, -0.002265996066853404, -0.017048921436071396, 0.006553404498845339, -0.00217068032361567, 0.009085564874112606, 0.011775985360145569, -0.02514895796775818, -0.002679630182683468, 0.0030608929228037596, -3.439458305365406e-05, -0.00363818253390491, -0.03939236328005791, 0.0016806137282401323, -0.01207092497497797, 0.01739421673119068, -0.02241537719964981, -0.01753808930516243, -0.023782167583703995, -0.01598426327109337, -0.02575322426855564, -0.016876274719834328, -0.006380756851285696, -0.0009149408433586359, 0.00704616867005825, -0.0013290246715769172, -0.009776154533028603, -0.013200325891375542, -0.024832438677549362, -0.0019404839258641005, 0.027220726013183594, -0.004765785299241543, -0.008553235791623592, -0.023120352998375893, 0.006920279935002327, 0.017739512026309967, 0.0166892409324646, -0.019408436492085457, 0.010207772254943848, 0.01595548912882805, 0.004783769138157368, 0.008855368942022324, 0.018084805458784103, -0.012603254057466984, -0.002003428293392062, -0.0008407564600929618, 0.00394211383536458, -0.018948042765259743, 0.005722539033740759, -0.004244246520102024, -0.011502627283334732, -0.000936971337068826, 0.006873521022498608, -0.0038593867793679237, 0.0003349537728354335, 0.02490437589585781, 0.022861381992697716, -0.013833366334438324, 0.005657796282321215, 0.027896929532289505, -0.020415544509887695, -0.007143282797187567, 0.014862056821584702, -0.00667569600045681, -0.020199736580252647, 0.01827184110879898, -0.0030698850750923157, -0.032975636422634125, 0.02595464698970318, -0.0014818893978372216, -0.004906061105430126, 0.01008548028767109, 0.009337342344224453, -0.009833703748881817, -0.0011680669849738479, 0.010653777979314327, -0.0006110096583142877, 0.016228847205638885, -0.010589035227894783, 0.0010997274657711387, 0.020300446078181267, 0.005715345498174429, 0.009862477891147137, -0.0015664147213101387, -0.009207856841385365, -0.013480877503752708, -0.01759563945233822, 0.007992131635546684, -0.012639221735298634, -0.016833113506436348, -0.01654536835849285, 0.009366116486489773, 0.004229859448969364, -0.0044168937020003796, -0.00028122629737481475, -0.028918424621224403, 0.030616123229265213, -0.017020147293806076, -0.02500508539378643, 0.01844448782503605, 0.00011554780940059572, 0.021278781816363335, -0.01503470353782177, -0.024760503321886063, -0.02408429980278015, 0.03734936937689781, 0.000861438165884465, 0.021365106105804443, -0.006740438751876354, 0.005557085387408733, -0.017005760222673416, -0.01831500232219696, -0.01458150427788496, -0.0207896139472723, -0.004100373946130276, 0.011214882135391235, 0.03228504955768585, 0.00543119665235281, 0.02251608669757843, 0.011373141780495644, 0.0207896139472723, 0.004032033961266279, 0.019768116995692253, -0.016329558566212654, -0.02755163423717022, -0.0001643296709517017, 0.04163677617907524, -0.02163846418261528, 0.019394047558307648, -0.028975974768400192, 0.040543343871831894, 0.006010284647345543, 0.009812122210860252, 0.024746114388108253, -0.027781831100583076, -0.0009360721451230347, 0.002836091909557581, -0.008733076974749565, 0.010754489339888096, -0.005380841437727213, 0.01586916483938694, 0.0003014584071934223, 0.006862730719149113, -0.033666227012872696, -0.01664607785642147, 0.001758844475261867, 0.0125528983771801, 0.0065066455863416195, 0.016228847205638885, 0.010466743260622025, 0.0251057967543602, -0.009215050376951694, -0.016027426347136497, 0.0116033386439085, 0.019667407497763634, 0.008905723690986633, 0.011517014354467392, -0.0036561666056513786, 0.02263118512928486, 0.027868153527379036, 0.02509140968322754, 0.011984600685536861, -0.016473431140184402, -0.013847753405570984, 0.01377581711858511, -0.010315677151083946, -0.0076612248085439205, 0.031076516956090927, 0.06526068598031998, -0.01297012995928526, -0.008610785007476807, 0.02340809814631939, 0.0038989519234746695, 0.009251018986105919, 0.003494309727102518, 0.009301373735070229, 0.003737095044925809, -0.03757956624031067, -0.0013029477559030056, -0.5865404605865479, -0.013444909825921059, 0.0021041391883045435, -0.004449265077710152, 0.0020052266772836447, 0.010776069946587086, 0.025695675984025, 0.01172563061118126, -0.02909107320010662, 0.015250512398779392, -0.014487987384200096, 0.04822615161538124, 0.014487987384200096, -0.030357152223587036, -0.01828622817993164, -0.00993441417813301, 0.010294096544384956, 0.0014764942461624742, 0.005524714011698961, -0.001750751631334424, -0.020386770367622375, 0.010977491736412048, -0.01082642562687397, -0.010092674754559994, 0.021091746166348457, -0.009229437448084354, 0.012013375759124756, -0.01284064445644617, -0.00015736083150841296, 0.009063984267413616, -0.013660718686878681, 0.0058807991445064545, 0.013955658301711082, 0.0028414870612323284, 0.046960070729255676, -0.01130120549350977, -0.04943468049168587, 0.006970635149627924, 0.0046219127252697945, 0.031335487961769104, -0.031306713819503784, -0.026357490569353104, 0.042902857065200806, 0.01129401195794344, -0.010941523127257824, -0.0027857364621013403, 0.00627644918859005, -0.023005254566669464, -0.0133226178586483, -0.05228336155414581, 0.025494253262877464, 0.016084974631667137, 0.014329726807773113, 0.03809750825166702, -0.025393541902303696, -0.02322106435894966, 0.006686486769467592, 0.01847326196730137, 0.02989676035940647, 0.0017372636357322335, -0.018530812114477158, 0.0418669730424881, -0.010092674754559994, 0.003927726298570633, -0.009876864962279797, -0.012322702445089817, 0.007977744564414024, 0.027177564799785614, 0.036255937069654465, 0.007841065526008606, 0.02255924977362156, 0.02076083980500698, 0.00024390929320361465, -0.01254570484161377, -0.005535504315048456, 0.014185854233801365, 0.01211408618837595, -0.020199736580252647, -0.02414184994995594, 0.006977829150855541, -0.010157416574656963, -0.025997808203101158, 0.017235957086086273, -0.008862562477588654, 0.017077697440981865, 0.02084716409444809, 0.0029583836439996958, -0.003523084335029125, 0.007387866266071796, -0.0035122937988489866, 0.01923578791320324, 0.010092674754559994, -0.0016896057641133666, 0.0016887065721675754, 0.025666899979114532, -0.0024979908484965563, -0.01455992367118597, 0.009351729415357113, -0.013631944544613361, -0.03240014612674713, -0.020573804154992104, -0.029004748910665512, -0.02096226066350937, -0.013495265506207943, 0.0216672383248806, 0.038011182099580765, -0.01664607785642147, -0.03078877180814743, 0.037234269082546234, 0.0005889791063964367, -0.0040392279624938965, -0.012365863658487797, -0.009862477891147137, -0.011783178895711899, -0.008862562477588654, -0.02922055870294571, 0.025407928973436356, 0.001157276565209031, 0.008589203469455242, -0.007941776886582375, 0.005686570890247822, 0.007190041244029999, 0.01579722948372364, -0.02337932400405407, 0.007003006525337696, -0.00768999895080924, 0.028645066544413567, -0.019710568711161613, -0.021307555958628654, -0.0257676113396883, 0.012660803273320198, -0.006607356481254101, 0.010459549725055695, 0.0007481383509002626, 0.02342248521745205, -0.007632450200617313, 0.02916300855576992, -0.02093348652124405, 0.035565346479415894, -0.011891083791851997, -0.02571006305515766, -0.0050067720003426075, -0.006441902834922075, -0.01040919404476881, -0.001439626794308424, -0.011502627283334732, -0.03838525339961052, -0.004352150950580835, 0.01746615394949913, -0.00197825045324862, -0.008661140687763691, 0.004237052984535694, -0.041377805173397064, 0.01595548912882805, -0.003506898647174239, -0.004805350210517645, -0.010229353792965412, -0.016372719779610634, 0.005852024536579847, -0.007006603758782148, 0.007790710311383009, 0.02512018382549286, -0.01458150427788496, 0.020429931581020355, -0.006862730719149113, -0.006783600896596909, -0.009898446500301361, 0.00603905925527215, -0.015279287472367287, -0.03827015310525894, -0.009409278631210327, 0.0021796722430735826, -0.011941439472138882, -0.009330148808658123, -0.010286902077496052, 0.01004231907427311, -0.023667069151997566, -0.007948970422148705, -0.013502459041774273, 0.00689150532707572, 0.028832102194428444, 0.02832854725420475, -0.0332346074283123, -0.012416219338774681, 0.009891252033412457, 0.017192795872688293, 0.01844448782503605, 0.0008421052480116487, 0.013560008257627487, 0.025292832404375076, -0.023954814299941063, 0.009912833571434021, -0.003154410282149911, -0.01086239330470562, 0.011509820818901062, 0.03752201795578003, -0.004481636453419924, -0.009013628587126732, 0.004283811431378126, 0.030299603939056396, 0.014164273627102375, 0.006959844846278429, 0.02920617163181305, -0.011977407149970531, -0.0028288981411606073, -0.023796554654836655, 0.001507966429926455, -0.008546042256057262, -0.019796893000602722, -0.021998144686222076, -0.000644280225969851, -0.014718183316290379, -0.013085227459669113, -0.005549891851842403, 0.008733076974749565, 0.042068395763635635, 0.00501756276935339, 0.01585477776825428, -0.02152336575090885, -0.01127243135124445, -0.005700958427041769, -0.003776659956201911, 0.028947200626134872, 0.004992384929209948, 0.0016985977999866009, -0.008143198676407337, 0.004729817155748606, 0.016444656997919083, 0.022372214123606682, 0.0038773708511143923, 0.0027857364621013403, 0.012365863658487797, 0.02819906175136566, 0.01549509633332491, 0.04822615161538124, -0.010222159326076508, 0.01845887489616871, -0.012236378155648708, 0.03939236328005791, -0.003064489923417568, -0.015552645549178123, 0.014200241304934025, 0.02571006305515766, -0.009466827847063541, -0.024501532316207886, -0.02660207450389862, 0.03274543955922127, -0.0028936408925801516, 0.0067907944321632385, 0.026213617995381355, -0.007431028410792351, 0.0007823081687092781, -0.004006856586784124, 0.02745092287659645, 0.02737898752093315, 0.014315339736640453, 0.004862899426370859, 0.010315677151083946, 0.01989760249853134, 0.041982073336839676, 0.01996953971683979, 0.011092590168118477, 0.001743558095768094, -0.01131559256464243, 0.015236125327646732, 0.005107482895255089, 0.018027257174253464, -0.014962767250835896, 0.0006712563335895538, 0.01573967933654785, -0.02673156000673771, -0.022156406193971634, -0.04419771209359169, 0.00019040661572944373, 0.020458707585930824, 0.011258043348789215, 0.01904875412583351, 0.015164189040660858, 0.010517098940908909, -0.021998144686222076, -0.039593786001205444, -0.01992637850344181, 0.04448545724153519, 0.011452271603047848, -0.0019728553015738726, -0.021120522171258926, 0.0043809255585074425, 0.0013056453317403793, -0.017753899097442627, -0.0200558640062809, -0.019005591049790382, 0.02409868687391281, 0.014804507605731487, 0.006682890001684427, 0.01008548028767109, -0.00481614051386714, 0.030616123229265213, -0.003010537475347519, -0.014790119603276253, -0.006780004128813744, -0.028055189177393913, 0.017940932884812355, -0.01424340344965458, 0.0034511478152126074, 0.04736291244626045, -0.012301120907068253, -0.016502205282449722, -0.0018685475224629045, -0.009999156929552555, -0.004460055846720934, 0.03487475588917732, -0.009790541604161263, -0.019422823563218117, -0.03389642387628555, 0.009991963393986225, 0.0016770168440416455, -0.005945541895925999, -0.0014899822417646646, 0.04517604783177376, 0.008596397936344147, -0.021134909242391586, -0.03263034299015999, -0.025407928973436356, 0.012257959693670273, 0.026127293705940247, 0.05098850652575493, 0.004111164249479771, 0.021149296313524246, -0.021278781816363335, 0.00727636506780982, -0.013380167074501514, -0.014171467162668705, 0.03737814351916313, -0.019653018563985825, 0.018861718475818634, 0.012991710565984249, -0.0012795684160664678, -0.016358332708477974, 0.032112400978803635, 0.017120858654379845, 0.032054852694272995, -0.006474274210631847, -0.01131559256464243, -0.014667828567326069, -0.012689577415585518, 0.01907752826809883, 0.022012531757354736, 0.01740860380232334, 0.0033450417686253786, 0.02322106435894966, 0.03893196955323219, 0.033666227012872696, 0.025566190481185913, -0.017825834453105927, -0.02571006305515766, 0.011258043348789215, 0.023465648293495178, 0.012315508909523487, -0.027235113084316254, 0.020415544509887695, 0.005945541895925999, 0.0029961501713842154, 0.02083277516067028, -0.004118357785046101, 0.012236378155648708, -0.012941354885697365, 0.008776238188147545, 0.009661056101322174, 0.009078371338546276, 0.0009558546589687467, -0.005672183819115162, 0.008869756013154984, 0.0011509821051731706, 0.021883046254515648, 0.008085649460554123, -0.012135667726397514, -0.027896929532289505, -0.03406906872987747, 0.027824992313981056, 0.029724113643169403, -0.02762356959283352, -0.023034028708934784, -0.027321437373757362, -0.02746530994772911, -0.02409868687391281, -0.016372719779610634, -0.0012382050044834614, -0.005837637465447187, 0.001647343160584569, 0.016257621347904205, -0.01657414250075817, 0.0007503863889724016, -0.02595464698970318, 0.01595548912882805, -0.0030051423236727715, -0.002246213611215353, -0.02493315003812313, 0.015610194765031338, 0.008546042256057262, -0.0008038890664465725, 0.03818383067846298, -0.004553572740405798, 0.00098912522662431, -0.0023703037295490503, -0.01996953971683979, -0.042931631207466125, -0.02591148391366005, -0.03473088517785072, 0.010941523127257824, 0.0009603506769053638, 0.021307555958628654, -0.008920111693441868, -0.011142945848405361, 0.022688735276460648, 0.016444656997919083, -0.006197319366037846, 0.020660128444433212, 0.016516592353582382, -0.01410672441124916, 0.006949054542928934, 0.014682215638458729, 0.01007109321653843, 0.009200663305819035, 0.009488408453762531, 0.023623907938599586, 0.028573131188750267, -0.005967122968286276, -0.014675022102892399, -0.013185938820242882, 0.008323038928210735, -0.0018883299781009555, 0.010193385183811188, 0.006582179106771946, -0.0028288981411606073, 0.02163846418261528, -0.005456374492496252, 0.012783095240592957, 0.01589794084429741, 0.030501024797558784, 0.0026544525753706694, 0.035392697900533676, 0.012610447593033314, -0.0026112906634807587, -0.023810941725969315, 0.0035212859511375427, -0.01579722948372364, 0.03263034299015999, -0.025465479120612144, 0.006096608471125364, 0.009207856841385365, -0.009790541604161263, -0.02926371991634369, -0.003773063188418746, 0.023868491873145103, -0.01750931516289711, 0.016171298921108246, 0.033004410564899445, -0.00790580827742815, -0.00728355860337615, 0.0021077359560877085, -0.017955319955945015, -0.0032533227931708097, -0.0004282462759874761, -0.006204512901604176, 0.014430438168346882, -0.03985275700688362, 0.01591232791543007, -0.03519127890467644, -0.03749324008822441, -0.025379154831171036, -0.019552309066057205, -0.008078455924987793, 0.042874082922935486, 0.018502037972211838, -0.010545873083174229, -0.009085564874112606, -0.026069745421409607, -0.0023127547465264797, 0.00957473274320364, -0.02929249405860901, -0.03487475588917732, -0.01132997963577509, 0.03188220411539078, 0.009107145480811596, 0.030644899234175682, -0.013459296897053719, 0.04224104434251785, -0.016789952293038368, 0.0027731475420296192, 0.002381094265729189, 0.01130120549350977, 0.008934498764574528, -0.012337089516222477, 0.007517351768910885, 0.0005318796029314399, -0.005711748730391264, -0.01535122375935316, 0.0017957119271159172, -0.004132745321840048, 0.02592587098479271, -0.026285553351044655, -0.0012876612599939108, -0.01759563945233822, -0.029839210212230682, -0.003095062915235758, -0.0052477591671049595, -0.015567032620310783, 0.01664607785642147, -0.028098350390791893, 0.009984769858419895, 0.04307550564408302, 0.0010817432776093483, 0.00710731465369463, 0.016041813418269157, 0.010437969118356705, -0.028573131188750267, 0.010704133659601212, 0.005229774862527847, 0.002433248097077012, 0.012229184620082378, -0.0018793379422277212, -0.006312417332082987, 0.01743737794458866, -0.016948211938142776, 0.009502795524895191, 0.017782673239707947, -0.00690589239820838, -0.010553067550063133, -0.01595548912882805, 0.020228510722517967, -0.028558744117617607, -0.012186023406684399, 0.024443982169032097, -0.03150813654065132, 0.006003091111779213, -0.03320583328604698, -0.024659791961312294, -0.013876527547836304, 0.007312333211302757, 0.00689869886264205, 0.0004842217604164034, 0.020991036668419838, -0.007797903846949339, -0.014437631703913212, 0.0003288841398898512, -0.01674678921699524, 0.02819906175136566, -0.006826762575656176, 0.03763711452484131, 0.03332093358039856, 0.0006676595658063889, -0.046960070729255676, 0.01986882835626602, 0.03254402056336403, 0.005783685017377138, 0.013142776675522327, 0.02589709684252739, 0.009905640035867691, 0.006765616592019796, -0.00791300181299448, -2.810014905207936e-07, -0.03228504955768585, -0.018904881551861763, 0.01572529226541519, 0.02008463814854622, 0.014732571318745613, -0.009279793128371239, -0.011768791824579239, -0.021998144686222076, -0.011135751381516457, -0.01209969911724329, 0.005643409211188555, -0.008459718897938728, -0.0033558320719748735, -0.012826256453990936, 0.03844280168414116, -0.01502031646668911, 0.009776154533028603, 0.020458707585930824, -0.009704218246042728, -0.012373057194054127, -0.021839885041117668, -0.030587349086999893, -0.005729732569307089, -0.026199230924248695, 0.04943468049168587, 0.0026490571908652782, -0.02658768743276596, 0.002050186973065138, -0.010668165050446987, -0.016200073063373566, -0.046413354575634, -0.008589203469455242, 0.04224104434251785, -0.019336499273777008, -0.009661056101322174, 0.01995515264570713, 0.013509652577340603, -0.0022803833708167076, -0.005157838575541973, -0.016890661790966988, -0.006258465349674225, -0.03591063991189003, 0.003055497771129012, 0.010761682875454426, 0.004546379204839468, -0.003618400078266859, 0.009402085095643997, -0.016473431140184402, -0.018832944333553314, -0.015624581836163998, -0.035392697900533676, 0.009862477891147137, -0.04411138966679573, 0.01038761343806982, -0.030213279649615288, -0.01329384371638298, 0.030414702370762825, 0.02084716409444809, -0.03421294316649437, 0.013444909825921059, 0.04914693534374237, -0.031134065240621567, 0.009286986663937569, 0.00023739006428513676, 0.012811869382858276, -0.03507617861032486, -0.0007841065526008606, -0.020286059007048607, -0.013027678243815899, 0.025249669328331947, -0.009028015658259392, -0.023839715868234634, -0.017164019867777824, 0.02319229021668434, -0.0019171045860275626, 0.01906314119696617, 0.0008443532860837877, 0.05933312699198723, 0.012596060521900654, 0.01253131777048111, -0.034673336893320084, -0.004103970713913441, 0.03812628239393234, -0.02589709684252739, 0.0006658611237071455, 0.002077162964269519, -0.008301458321511745, 0.008596397936344147, -0.003589625470340252, -0.005057127680629492, 0.009358922950923443, -0.03746446594595909, 0.003701126901432872, -0.0028666649013757706, 0.00605344632640481, -0.023667069151997566, -0.016401495784521103, -0.021969370543956757, 0.007841065526008606, -0.0040859864093363285, 0.014703796245157719, -0.016243234276771545, -0.02001270093023777, -0.005798072554171085, 0.015279287472367287, -0.0018577571026980877, 0.011373141780495644, -0.01164649985730648, -0.0021329137962311506, -0.0023667069617658854, -0.025508640334010124, -0.0201277993619442, -0.007071346510201693, -0.01904875412583351, 0.03755079209804535, -0.011042234487831593, 0.005898783449083567, -0.011883890256285667, -0.023595133796334267, -0.02179672382771969, -0.00870430190116167, -0.02490437589585781, 0.011387528851628304, 0.02986798621714115, -0.0021958579309284687, -0.008265490643680096, 0.025235282257199287, 0.001056565553881228, 0.020688902586698532, -0.004470846150070429, 0.0018919268622994423, 0.015509483404457569, -0.004175907000899315, 0.025307219475507736, 0.012272346764802933, -0.019653018563985825, -0.02491876296699047, 0.004352150950580835, 0.029839210212230682, 0.041233934462070465, -0.011984600685536861, -0.0008515469380654395, -0.0076684183441102505, 0.0021868660114705563, -0.016099361702799797, -0.012682383880019188, -0.005197403486818075, 0.0012750723399221897, -0.018545199185609818, 0.0014576109824702144, 0.017681961879134178, 0.00019715064263436943, -0.024760503321886063, -0.017955319955945015, -0.011351561173796654, -0.019465984776616096, 0.009869671426713467, 0.005078708287328482, 0.010754489339888096, 0.024789277464151382, -0.02332177571952343, 0.0173510555177927, 0.0037838537245988846, -0.001575406757183373, 0.0241562370210886, -0.004233456216752529, 0.006970635149627924, -0.01333700492978096, 0.014653440564870834, -0.02986798621714115, 0.008517267182469368, 0.009200663305819035, -0.0011563773732632399, 0.026299940422177315, -0.008092842996120453, -0.01424340344965458, -0.018559586256742477, -0.02594025991857052, -0.009970382787287235, -0.0010502712102606893, -0.033666227012872696, -0.017739512026309967, -0.006596566177904606, -0.027868153527379036, -0.00879781972616911, -0.024458369240164757, -0.010178998112678528, -0.004575153812766075, -0.018948042765259743, -0.02156652696430683, -0.04434158653020859, 0.002605895511806011, 0.0007724168826825917, -0.0012507938081398606, -0.047880854457616806, -0.017825834453105927, -0.010164611041545868, 0.012804675847291946, 0.027249502018094063, 0.18795537948608398, 0.006362773012369871, 0.003310872009024024, 0.01674678921699524, -0.011970213614404202, -0.0014054570347070694, 0.008840980939567089, -0.010963104665279388, -0.037953633815050125, 0.04327692836523056, -0.011142945848405361, 0.010308483615517616, -0.008617978543043137, -0.0027335823979228735, -0.008085649460554123, 0.010301290079951286, -0.043823644518852234, 0.00564700597897172, -0.05435512959957123, -0.009905640035867691, 0.033608678728342056, -0.004006856586784124, -0.006643324624747038, -0.02178233675658703, 0.02596903406083584, -0.03418416902422905, 0.010704133659601212, 0.009761766530573368, 0.014085143804550171, 0.005226178094744682, -0.023710230365395546, 0.0055750696919858456, -0.000936971337068826, -0.026961755007505417, -0.01926456205546856, -0.037953633815050125, 0.012315508909523487, 0.008466912433505058, 0.017077697440981865, 0.0058843959122896194, 0.002050186973065138, -0.0251057967543602, 0.015178576111793518, -0.01746615394949913, -0.00251237815245986, 0.032831765711307526, -0.008956079371273518, 0.0013020484475418925, -0.012480962090194225, 0.03896074369549751, -0.04566521570086479, 0.009955994784832, 0.03170955553650856, 0.011121364310383797, 0.016789952293038368, -0.009452440775930882, -0.01335139200091362, -0.0038342091720551252, 0.022314665839076042, 0.009438052773475647, -0.023940427228808403, 0.033666227012872696, 0.0005417708889581263, 0.007790710311383009, -0.018861718475818634, 0.011696855537593365, -0.02926371991634369, 0.005657796282321215, 0.02402675151824951, -0.04578031226992607, 0.009509989991784096, -0.024875599890947342, 0.004308989271521568, -0.007182847708463669, -0.013200325891375542, -0.00787703413516283, 0.0005876303184777498, 0.00913592055439949, 0.009840897284448147, 0.014437631703913212, -0.0050103687681257725, -0.04721904173493385, -0.007344704587012529, -0.0003257369389757514, -0.042010847479104996, -0.03510495275259018, 0.028832102194428444, 0.022314665839076042, -0.00880501326173544, -0.006744035519659519, 0.0007535336189903319, 0.011351561173796654, 0.011847921647131443, -0.005021159537136555, 0.005154241807758808, 0.014286565594375134, -0.00790580827742815, 0.00566139305010438, -0.022976480424404144, 0.004111164249479771, -0.013502459041774273, 0.05501694604754448, -0.011704049073159695, -0.044600557535886765, -0.00791300181299448, 0.009646669030189514, 0.012236378155648708, 0.006909489631652832, 0.010596228763461113, -0.027954477816820145, -0.00710731465369463, -0.0499238483607769, -0.011991795152425766, -0.007524545304477215, -0.00540961604565382, 0.025307219475507736, 0.0064634839072823524, -0.033004410564899445, 0.002731784014031291, -0.013193132355809212, 0.02581077441573143, -0.0014908815501257777, 0.02258802391588688, 0.018775396049022675, -0.0019998312927782536, 0.0010511704022064805, -0.010790457017719746, -0.008517267182469368, -0.00603186571970582, -0.04321937635540962, 0.015653356909751892, -0.0073806727305054665, 0.010452356189489365, -0.013099615462124348, -0.007869839668273926, -0.00067800038959831, 0.017279118299484253, 6.013431993778795e-05, 0.007096523884683847, -0.008653946220874786, -0.003602214390411973, -0.016833113506436348, -0.011797566898167133, -0.011344367638230324, 0.0015385393053293228, -0.005121870432049036, 0.023868491873145103, 0.0026652428787201643, -0.021422654390335083, 0.007395059801638126, -0.019336499273777008, -0.004528395365923643, 0.014272177591919899, 0.009783348068594933, 0.004319779574871063, -0.010617810301482677, -0.030443476513028145, -0.041406579315662384, 0.005366453900933266, -0.007157669868320227, -0.02171039953827858, 0.018919268622994423, 0.037780988961458206, 0.0038342091720551252, -0.03165200725197792, 0.003003343939781189, -0.18473263084888458, -0.0065066455863416195, -0.018631523475050926, -0.034845981746912, 0.027120016515254974, -0.011761598289012909, 0.031018966808915138, 0.0005489644827321172, -0.006830359343439341, -0.01999831385910511, 0.011416303925216198, 0.0192501749843359, -0.049578554928302765, -0.014660634100437164, -0.004258633591234684, -0.015192964114248753, 0.014732571318745613, -0.0006928372895345092, 0.03691774979233742, 0.017739512026309967, 0.018214290961623192, -0.0349898561835289, 0.010553067550063133, 0.009215050376951694, 0.00868272129446268, 0.03240014612674713, -0.004478039685636759, -0.017897771671414375, -0.009438052773475647, -0.014962767250835896, -0.003569843014702201, 0.018530812114477158, 0.02924933284521103, 0.0018083007307723165, 0.0029098265804350376, -0.0023361339699476957, -0.007431028410792351, -3.0151459213811904e-05, 0.015178576111793518, -0.004916851874440908, 0.018185516819357872, 0.018142355605959892, -0.0067943911999464035, 0.026472589001059532, -0.01988321542739868, 0.017926545813679695, 0.04998139664530754, -0.0366012305021286, 0.01128681842237711, -0.021106135100126266, 0.02832854725420475, -0.025278443470597267, -0.009682636708021164, 0.004154325928539038, 0.021897435188293457, 0.005571472924202681, 0.01586916483938694, 0.021868659183382988, -0.03099019266664982, -6.738415686413646e-05, -0.0005327787948772311, -0.018631523475050926, 0.015293674543499947, -0.00353387463837862, 0.0028648662846535444, -0.004891674034297466, -0.010668165050446987, -0.00961789395660162, -0.00768999895080924, 0.017797060310840607, -0.020631354302167892, -0.011660887859761715, 0.005085902288556099, 0.012171635404229164, 0.008121617138385773, 0.028932811692357063, -0.031076516956090927, 0.009229437448084354, 0.003350436920300126, -0.005826846696436405, -0.004902464337646961, 0.042989183217287064, -0.009891252033412457, -0.008855368942022324, -0.00239368318580091, -0.01040200050920248, -0.010660971514880657, -0.012495349161326885, 0.008071262389421463, -0.02008463814854622, 0.01421462930738926, -0.018228679895401, -0.014862056821584702, -0.011768791824579239, -0.01502031646668911, 0.004028437193483114, -0.0033971955999732018, -0.008164779283106327, 0.01254570484161377, -0.008006519638001919, -0.016789952293038368, -0.020314833149313927, -0.003051901003345847, -0.003602214390411973, -0.012747126631438732, 0.0015700114890933037, -0.03668755292892456, -0.0027929299976676702, 0.020616967231035233, 0.011761598289012909, -0.011222075670957565, 0.03196852654218674, 0.009876864962279797, 0.02583954855799675, -0.009862477891147137, 0.0027821394614875317, -0.0038665805477648973, 0.006823165807873011, 0.017063310369849205, -0.020199736580252647, 0.041406579315662384, -0.0116033386439085, -0.011761598289012909, 0.023494422435760498, -0.011876696720719337, -0.023839715868234634, -0.09668249636888504, -0.031076516956090927, -0.00787703413516283, 0.022789444774389267, -0.014164273627102375, 0.030242053791880608, 0.003294686321169138, 0.033579904586076736, -0.007956163957715034, 0.0083446204662323, -0.0047190263867378235, -0.04445668309926987, -0.0008317644242197275, 0.012351476587355137, -0.0026076938956975937, 0.011027847416698933, -0.02244415134191513, -0.017048921436071396, 0.0200558640062809, 0.03429926559329033, 0.02419939823448658, -0.021839885041117668, 0.011970213614404202, -0.013207519426941872, -0.028012026101350784, -0.003449349431321025, -0.017782673239707947, 0.0173510555177927, 0.005963526200503111, 0.019523533061146736, 0.0380687341094017, 0.003035715315490961, 0.04742046073079109, -0.012588866986334324, -0.03691774979233742, -0.027925703674554825, -0.018199903890490532, 0.003346840152516961, 0.02432888373732567, -0.042845308780670166, 0.01912068948149681, 0.00871868897229433, 0.009286986663937569, 0.005618231371045113, -0.023091578856110573, 0.005111079663038254, -0.02248731255531311, 0.03752201795578003, 0.04575153812766075, -0.01504909060895443, -0.0402555987238884, 0.02737898752093315, -0.00015803524001967162, -0.021019810810685158, 0.003553657326847315, -0.007510158233344555, 0.03435681387782097, 0.006114592310041189, -0.00917188823223114, -0.007315929979085922, 0.008136005140841007, -0.0019332902738824487, 0.015192964114248753, 0.009653862565755844, -0.026961755007505417, 0.009898446500301361, 0.0005602045566774905, -0.03196852654218674, 0.03087509423494339, -0.01756686344742775, -0.017048921436071396, 0.022329052910208702, -0.025666899979114532, 0.009416472166776657, -0.01910630241036415, 0.0036201984621584415, -1.7815495084505528e-05, -0.02760918252170086, 0.0330907367169857, -0.010898361913859844, -0.021192457526922226, -0.018084805458784103, -0.021437041461467743, 0.01674678921699524, 0.0108336191624403, 0.014077950268983841, 0.009445247240364552, 0.021264394745230675, 0.04825492575764656, -0.018746620044112206, -0.0029637787956744432, 0.0055462950840592384, 0.004226262215524912, -0.004301795735955238, 0.010977491736412048, 0.03979520499706268, 0.00434855418279767, 0.014876443892717361, 0.012466575019061565, -0.01500592939555645, -0.05447022616863251, 0.0012391041964292526, -0.08776238560676575, 0.04744923859834671, 0.011761598289012909, 0.0018631522543728352, 0.0025213700719177723, 0.008898530155420303, 0.0028504792135208845, -0.029436366632580757, -0.013416134752333164, 0.023710230365395546, -0.019796893000602722, 0.02408429980278015, 0.0043737320229411125, -0.000470733706606552, -0.0023667069617658854, -0.02506263554096222, 0.026904206722974777, 0.011704049073159695, 0.03904706612229347, -0.0019566696137189865, -0.0036148030776530504, -0.015538258478045464, -0.04005417600274086, 0.032227497547864914, 0.00518661318346858, -0.0316232331097126, 0.006064237095415592, -0.001987242605537176, 0.00768280541524291, -0.014387276023626328, 0.009718605317175388, -0.045435018837451935, 0.014185854233801365, 0.02838609553873539, 0.0023325372021645308, -0.02509140968322754, 0.005607441067695618, 0.0038342091720551252, 0.037867311388254166, 0.008934498764574528, -0.019537921994924545, -0.015192964114248753, -0.008848174475133419, 0.003163402434438467, 0.01165369339287281, -0.013092420995235443, -0.02582516148686409, 0.009768960997462273, 0.013660718686878681, 0.003801837796345353, 0.024846825748682022, 0.009028015658259392, -0.02255924977362156, 0.0010457751341164112, -0.0034835191909223795, -0.01376143004745245, 0.005657796282321215, 0.01582600362598896, -0.012459381483495235, -0.027911316603422165, 0.037032850086688995, -0.012905387207865715, 0.021293168887495995, -0.027767444029450417, 0.013711074367165565, 0.0028217046055942774, 0.00022333998640533537, 0.004111164249479771, 0.02421378530561924, 0.02842925861477852, -0.020257284864783287, -0.013689493760466576, 0.02169601246714592, 0.008215134963393211, 1.9108101696474478e-05, 0.010934329591691494, -0.018674684688448906, -0.006970635149627924, -0.010596228763461113, 0.0012220193166285753, 0.017868997529149055, -0.03596819192171097, -0.013833366334438324, -0.021422654390335083, 0.01211408618837595, -0.010437969118356705, 0.003111248603090644, -0.0020879535004496574, -0.014905218034982681, -0.013020484708249569, -0.03919094055891037, 0.010999072343111038, -0.012452187947928905, -0.0074238344095647335, 0.004161519464105368, 0.016228847205638885, 0.0009432657971046865, 0.013847753405570984, 0.006326804868876934, -0.004686655011028051, 0.006736841984093189, -0.004014050122350454, -0.02337932400405407, -0.025494253262877464, -0.022918930277228355, -0.007625256199389696, -0.019782504066824913, -0.020616967231035233, -0.007546126376837492, -0.008840980939567089, 0.022213954478502274, 0.01412830501794815, -0.003382808296009898, 0.026184841990470886, -0.021278781816363335, 0.0027929299976676702, 0.022199567407369614, -0.00439531309530139, -0.004470846150070429, 0.023882878944277763, 0.0055966502986848354, 0.01752370223402977, 0.03401152044534683, 0.007524545304477215, 0.023739006370306015, 0.011214882135391235, 0.029666563495993614, -0.02242976427078247, 0.0077187735587358475, -0.030673673376441002, 0.011869503185153008, -0.005143451038748026, -0.013545620255172253, -0.012624834664165974, -0.02844364568591118, -0.0009243824752047658, -0.037867311388254166, 0.020214123651385307, -0.005165032111108303, 0.07366285473108292, 0.0038521932438015938, -0.020559417083859444, -0.03855790197849274, 0.018070418387651443, -0.007218815851956606, 0.01831500232219696, -0.0014189451467245817, 0.028760164976119995, -0.027666732668876648, 0.022890156134963036, 0.0019530727295204997, 0.011862309649586678, 0.0022030516993254423, -0.012243571691215038, -0.008661140687763691, 0.0015169584657996893, 0.02845803275704384, -0.010711327195167542, 0.007319526746869087, 0.022170793265104294, 0.009186276234686375, 0.019422823563218117, 0.018948042765259743, 0.0028810519725084305, -0.005330485757440329, 0.03004063293337822, -0.002987158251926303, -0.0058700088411569595, -0.04316182807087898, -0.0008155787363648415, -0.012286733835935593, 0.011552982963621616, -0.00666850246489048, 0.0076612248085439205, -0.022026920691132545, -0.001176159828901291, 0.0009864276507869363, 0.00605344632640481, 0.010013544000685215, -0.011912664398550987, 0.010466743260622025, -0.0481685996055603, -0.020458707585930824, 0.016818726435303688, -0.02411307580769062, 0.017912158742547035, 0.002213842235505581, -0.022026920691132545]\n"
     ]
    }
   ],
   "source": [
    "print(nodes[0].get_embedding())"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "47b9f7ee",
   "metadata": {},
   "source": [
    "### Load documents and metadata into TimescaleVector vectorstore\n",
    "Now that we have prepared our nodes and added embeddings to them, let's add them into our TimescaleVector vectorstore.\n",
    "\n",
    "We'll create a Timescale Vector instance from the list of nodes we created.\n",
    "\n",
    "First, we'll define a collection name, which will be the name of our table in the PostgreSQL database. \n",
    "\n",
    "We'll also define a time delta, which we pass to the `time_partition_interval` argument, which will be used to as the interval for partitioning the data by time. Each partition will consist of data for the specified length of time. We'll use 7 days for simplicity, but you can pick whatever value make sense for your use case -- for example if you query recent vectors frequently you might want to use a smaller time delta like 1 day, or if you query vectors over a decade long time period then you might want to use a larger time delta like 6 months or 1 year.\n",
    "\n",
    "Then we'll add the nodes to the Timescale Vector vectorstore."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "db159054",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create a timescale vector store and add the newly created nodes to it\n",
    "ts_vector_store = TimescaleVectorStore.from_params(\n",
    "    service_url=TIMESCALE_SERVICE_URL,\n",
    "    table_name=\"li_commit_history\",\n",
    "    time_partition_interval=timedelta(days=7),\n",
    ")\n",
    "_ = ts_vector_store.add(nodes)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "a44793e3",
   "metadata": {},
   "source": [
    "### Querying vectors by time and similarity\n",
    "\n",
    "Now that we have loaded our documents into TimescaleVector, we can query them by time and similarity.\n",
    "\n",
    "TimescaleVector provides multiple methods for querying vectors by doing similarity search with time-based filtering Let's take a look at each method below.\n",
    "\n",
    "First we define a query string and get the vector embedding for the query string."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2984d681",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Define query and generate embedding for it\n",
    "query_str = \"What's new with TimescaleDB functions?\"\n",
    "embed_model = OpenAIEmbedding()\n",
    "query_embedding = embed_model.get_query_embedding(query_str)"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "317d302e",
   "metadata": {},
   "source": [
    "Then we set some variables which we'll use in our time filters."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "a05a9597",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Time filter variables for query\n",
    "start_dt = datetime(\n",
    "    2023, 8, 1, 22, 10, 35\n",
    ")  # Start date = 1 August 2023, 22:10:35\n",
    "end_dt = datetime(\n",
    "    2023, 8, 30, 22, 10, 35\n",
    ")  # End date = 30 August 2023, 22:10:35\n",
    "td = timedelta(days=7)  # Time delta = 7 days"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "374662b5",
   "metadata": {},
   "source": [
    "Method 1: Filter within a provided start date and end date."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "b92bf8a1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "VectorStoreQueryResult(nodes=[TextNode(id_='22747180-31f1-11ee-bd8e-101e36c28c91', embedding=None, metadata={'commit': ' 7aeed663b9c0f337b530fd6cad47704a51a9b2ec', 'author': 'Dmitry Simonenko', 'date': '2023-08-3 14:30:23+0500'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='3273f20a98f02c75847896b929888b05e8751ae5e258d7feb8605bd5290ef8ca', text='Thu Aug 3 14:30:23 2023 +0300 Dmitry Simonenko Feature flags for TimescaleDB features This PR adds several GUCs which allow to enable/disable major timescaledb features:  - enable_hypertable_create - enable_hypertable_compression - enable_cagg_create - enable_policy_create ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), TextNode(id_='faa8ea00-4686-11ee-b933-c2c7df407c25', embedding=None, metadata={'commit': ' e4facda540286b0affba47ccc63959fefe2a7b26', 'author': 'Sven Klemm', 'date': '2023-08-29 18:13:24+0320'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='6f45ab1cccf673ddf75c625983b6cf2f4a66bbf865a4c1c65025997a470f3bb3', text='Tue Aug 29 18:13:24 2023 +0200 Sven Klemm Add compatibility layer for _timescaledb_internal functions With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating. ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), TextNode(id_='d7080180-40d2-11ee-af6f-f43e81a0925a', embedding=None, metadata={'commit': ' cf04496e4b4237440274eb25e4e02472fc4e06fc', 'author': 'Sven Klemm', 'date': '2023-08-22 12:01:19+0320'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='d5a20dc83ae04f44aa901ba2f654e80ca68cb21f6a313bd91afcd91e404b471e', text='Tue Aug 22 12:01:19 2023 +0200 Sven Klemm Move utility functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - generate_uuid() - get_git_commit() - get_os_info() - tsl_loaded() ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), TextNode(id_='01b10780-4649-11ee-a375-5719b2881af3', embedding=None, metadata={'commit': ' a9751ccd5eb030026d7b975d22753f5964972389', 'author': 'Sven Klemm', 'date': '2023-08-29 10:49:47+0320'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='8fde14d147def41808d82bf2ffa35e1e0ed78b0331962907cee856af34a34e44', text='Tue Aug 29 10:49:47 2023 +0200 Sven Klemm Move partitioning functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - get_partition_for_key(val anyelement) - get_partition_hash(val anyelement) ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), TextNode(id_='e7ba7f80-36af-11ee-9479-6c18a6a65db1', embedding=None, metadata={'commit': ' 44eab9cf9bef34274c88efd37a750eaa74cd8044', 'author': 'Konstantina Skovola', 'date': '2023-08-9 15:26:03+0500'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='f0db9c719928ecc16653bbf0a44f1eaeb221ac79dae16fd36710044d1561dbfa', text='Wed Aug 9 15:26:03 2023 +0300 Konstantina Skovola Release 2.11.2 This release contains bug fixes since the 2.11.1 release. We recommend that you upgrade at the next available opportunity.  **Features** * #5923 Feature flags for TimescaleDB features  **Bugfixes** * #5680 Fix DISTINCT query with JOIN on multiple segmentby columns * #5774 Fixed two bugs in decompression sorted merge code * #5786 Ensure pg_config --cppflags are passed * #5906 Fix quoting owners in sql scripts. * #5912 Fix crash in 1-step integer policy creation  **Thanks** * @mrksngl for submitting a PR to fix extension upgrade scripts * @ericdevries for reporting an issue with DISTINCT queries using segmentby columns of compressed hypertable ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n')], similarities=[0.18141598590707553, 0.1821951378700205, 0.1948705199438009, 0.19657938500765504, 0.19664154042725346], ids=[UUID('22747180-31f1-11ee-bd8e-101e36c28c91'), UUID('faa8ea00-4686-11ee-b933-c2c7df407c25'), UUID('d7080180-40d2-11ee-af6f-f43e81a0925a'), UUID('01b10780-4649-11ee-a375-5719b2881af3'), UUID('e7ba7f80-36af-11ee-9479-6c18a6a65db1')])"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Query the vector database\n",
    "vector_store_query = VectorStoreQuery(\n",
    "    query_embedding=query_embedding, similarity_top_k=5\n",
    ")\n",
    "\n",
    "# return most similar vectors to query between start date and end date date range\n",
    "# returns a VectorStoreQueryResult object\n",
    "query_result = ts_vector_store.query(\n",
    "    vector_store_query, start_date=start_dt, end_date=end_dt\n",
    ")\n",
    "query_result"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "7f6f2825",
   "metadata": {},
   "source": [
    "Let's inspect the nodes that were returned from the similarity search:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "993e8cf9",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------------------------------------------\n",
      "2023-08-3 14:30:23+0500\n",
      "commit:  7aeed663b9c0f337b530fd6cad47704a51a9b2ec\n",
      "author: Dmitry Simonenko\n",
      "date: 2023-08-3 14:30:23+0500\n",
      "\n",
      "Thu Aug 3 14:30:23 2023 +0300 Dmitry Simonenko Feature flags for TimescaleDB features This PR adds several GUCs which allow to enable/disable major timescaledb features:  - enable_hypertable_create - enable_hypertable_compression - enable_cagg_create - enable_policy_create\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-29 18:13:24+0320\n",
      "commit:  e4facda540286b0affba47ccc63959fefe2a7b26\n",
      "author: Sven Klemm\n",
      "date: 2023-08-29 18:13:24+0320\n",
      "\n",
      "Tue Aug 29 18:13:24 2023 +0200 Sven Klemm Add compatibility layer for _timescaledb_internal functions With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating.\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-22 12:01:19+0320\n",
      "commit:  cf04496e4b4237440274eb25e4e02472fc4e06fc\n",
      "author: Sven Klemm\n",
      "date: 2023-08-22 12:01:19+0320\n",
      "\n",
      "Tue Aug 22 12:01:19 2023 +0200 Sven Klemm Move utility functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - generate_uuid() - get_git_commit() - get_os_info() - tsl_loaded()\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-29 10:49:47+0320\n",
      "commit:  a9751ccd5eb030026d7b975d22753f5964972389\n",
      "author: Sven Klemm\n",
      "date: 2023-08-29 10:49:47+0320\n",
      "\n",
      "Tue Aug 29 10:49:47 2023 +0200 Sven Klemm Move partitioning functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - get_partition_for_key(val anyelement) - get_partition_hash(val anyelement)\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-9 15:26:03+0500\n",
      "commit:  44eab9cf9bef34274c88efd37a750eaa74cd8044\n",
      "author: Konstantina Skovola\n",
      "date: 2023-08-9 15:26:03+0500\n",
      "\n",
      "Wed Aug 9 15:26:03 2023 +0300 Konstantina Skovola Release 2.11.2 This release contains bug fixes since the 2.11.1 release. We recommend that you upgrade at the next available opportunity.  **Features** * #5923 Feature flags for TimescaleDB features  **Bugfixes** * #5680 Fix DISTINCT query with JOIN on multiple segmentby columns * #5774 Fixed two bugs in decompression sorted merge code * #5786 Ensure pg_config --cppflags are passed * #5906 Fix quoting owners in sql scripts. * #5912 Fix crash in 1-step integer policy creation  **Thanks** * @mrksngl for submitting a PR to fix extension upgrade scripts * @ericdevries for reporting an issue with DISTINCT queries using segmentby columns of compressed hypertable\n"
     ]
    }
   ],
   "source": [
    "# for each node in the query result, print the node metadata date\n",
    "for node in query_result.nodes:\n",
    "    print(\"-\" * 80)\n",
    "    print(node.metadata[\"date\"])\n",
    "    print(node.get_content(metadata_mode=\"all\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "6df2d314",
   "metadata": {},
   "source": [
    "Note how the query only returns results within the specified date range."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "8014db61",
   "metadata": {},
   "source": [
    "Method 2: Filter within a provided start date, and a time delta later.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "33098109",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------------------------------------------\n",
      "2023-08-3 14:30:23+0500\n",
      "commit:  7aeed663b9c0f337b530fd6cad47704a51a9b2ec\n",
      "author: Dmitry Simonenko\n",
      "date: 2023-08-3 14:30:23+0500\n",
      "\n",
      "Thu Aug 3 14:30:23 2023 +0300 Dmitry Simonenko Feature flags for TimescaleDB features This PR adds several GUCs which allow to enable/disable major timescaledb features:  - enable_hypertable_create - enable_hypertable_compression - enable_cagg_create - enable_policy_create\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-7 19:49:47+-500\n",
      "commit:  5bba74a2ec083728f8e93e09d03d102568fd72b5\n",
      "author: Fabrízio de Royes Mello\n",
      "date: 2023-08-7 19:49:47+-500\n",
      "\n",
      "Mon Aug 7 19:49:47 2023 -0300 Fabrízio de Royes Mello Relax strong table lock when refreshing a CAGG When refreshing a Continuous Aggregate we take a table lock on _timescaledb_catalog.continuous_aggs_invalidation_threshold when processing the invalidation logs (the first transaction of the refresh Continuous Aggregate procedure). It means that even two different Continuous Aggregates over two different hypertables will wait each other in the first phase of the refreshing procedure. Also it lead to problems when a pg_dump is running because it take an AccessShareLock on tables so Continuous Aggregate refresh execution will wait until the pg_dump finish.  Improved it by relaxing the strong table-level lock to a row-level lock so now the Continuous Aggregate refresh procedure can be executed in multiple sessions with less locks.  Fix #3554\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-3 14:36:39+0500\n",
      "commit:  2863daf3df83c63ee36c0cf7b66c522da5b4e127\n",
      "author: Dmitry Simonenko\n",
      "date: 2023-08-3 14:36:39+0500\n",
      "\n",
      "Thu Aug 3 14:36:39 2023 +0300 Dmitry Simonenko Support CREATE INDEX ONLY ON main table This PR adds support for CREATE INDEX ONLY ON clause which allows to create index only on the main table excluding chunks.  Fix #5908\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-2 20:24:14+0140\n",
      "commit:  3af0d282ea71d9a8f27159a6171e9516e62ec9cb\n",
      "author: Lakshmi Narayanan Sreethar\n",
      "date: 2023-08-2 20:24:14+0140\n",
      "\n",
      "Wed Aug 2 20:24:14 2023 +0100 Lakshmi Narayanan Sreethar PG16: ExecInsertIndexTuples requires additional parameter PG16 adds a new boolean parameter to the ExecInsertIndexTuples function to denote if the index is a BRIN index, which is then used to determine if the index update can be skipped. The fix also removes the INDEX_ATTR_BITMAP_ALL enum value.  Adapt these changes by updating the compat function to accomodate the new parameter added to the ExecInsertIndexTuples function and using an alternative for the removed INDEX_ATTR_BITMAP_ALL enum value.  postgres/postgres@19d8e23\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-7 16:36:17+0500\n",
      "commit:  373c55662ca5f8a2993abf9b2aa7f5f4006b3229\n",
      "author: Konstantina Skovola\n",
      "date: 2023-08-7 16:36:17+0500\n",
      "\n",
      "Mon Aug 7 16:36:17 2023 +0300 Konstantina Skovola Fix ordered append for partially compressed chunks In the exclusive presence of partially compressed chunks, this optimization was not applied because no pathkeys were supplied. Additionally, this patch makes sure that if applicable, the `enable_decompression_sorted_merge` optimization is chosen for the path, since it is more beneficial due to the ability to push down the sort below DecompressChunk.\n"
     ]
    }
   ],
   "source": [
    "vector_store_query = VectorStoreQuery(\n",
    "    query_embedding=query_embedding, similarity_top_k=5\n",
    ")\n",
    "\n",
    "# return most similar vectors to query from start date and a time delta later\n",
    "query_result = ts_vector_store.query(\n",
    "    vector_store_query, start_date=start_dt, time_delta=td\n",
    ")\n",
    "\n",
    "for node in query_result.nodes:\n",
    "    print(\"-\" * 80)\n",
    "    print(node.metadata[\"date\"])\n",
    "    print(node.get_content(metadata_mode=\"all\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "90b4967a",
   "metadata": {},
   "source": [
    "Once again, notice how only nodes between the start date (1 August) and the defined time delta later (7 days later) are returned."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "082a4215",
   "metadata": {},
   "source": [
    "Method 3: Filter within a provided end date and a time delta earlier."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "ba1a6874",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "--------------------------------------------------------------------------------\n",
      "2023-08-29 18:13:24+0320\n",
      "commit:  e4facda540286b0affba47ccc63959fefe2a7b26\n",
      "author: Sven Klemm\n",
      "date: 2023-08-29 18:13:24+0320\n",
      "\n",
      "Tue Aug 29 18:13:24 2023 +0200 Sven Klemm Add compatibility layer for _timescaledb_internal functions With timescaledb 2.12 all the functions present in _timescaledb_internal were moved into the _timescaledb_functions schema to improve schema security. This patch adds a compatibility layer so external callers of these internal functions will not break and allow for more flexibility when migrating.\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-29 10:49:47+0320\n",
      "commit:  a9751ccd5eb030026d7b975d22753f5964972389\n",
      "author: Sven Klemm\n",
      "date: 2023-08-29 10:49:47+0320\n",
      "\n",
      "Tue Aug 29 10:49:47 2023 +0200 Sven Klemm Move partitioning functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - get_partition_for_key(val anyelement) - get_partition_hash(val anyelement)\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-28 23:26:23+0320\n",
      "commit:  b2a91494a11d8b82849b6f11f9ea6dc26ef8a8cb\n",
      "author: Sven Klemm\n",
      "date: 2023-08-28 23:26:23+0320\n",
      "\n",
      "Mon Aug 28 23:26:23 2023 +0200 Sven Klemm Move ddl_internal functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - chunk_constraint_add_table_constraint(_timescaledb_catalog.chunk_constraint) - chunk_drop_replica(regclass,name) - chunk_index_clone(oid) - chunk_index_replace(oid,oid) - create_chunk_replica_table(regclass,name) - drop_stale_chunks(name,integer[]) - health() - hypertable_constraint_add_table_fk_constraint(name,name,name,integer) - process_ddl_event() - wait_subscription_sync(name,name,integer,numeric)\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-29 14:47:57+0320\n",
      "commit:  08231c8aacd17152f315ad36d95c031fb46073aa\n",
      "author: Jan Nidzwetzki\n",
      "date: 2023-08-29 14:47:57+0320\n",
      "\n",
      "Tue Aug 29 14:47:57 2023 +0200 Jan Nidzwetzki Export is_decompress_chunk_path / is_gapfill_path This patch adds the 'ts_' prefix to the function names of is_decompress_chunk_path and is_gapfill_path and makes them available for use by other parts of TimescaleDB.\n",
      "--------------------------------------------------------------------------------\n",
      "2023-08-28 15:32:54+0320\n",
      "commit:  6576d969b319dac8e7fd08a9cf4cfc8197b34d1d\n",
      "author: Sven Klemm\n",
      "date: 2023-08-28 15:32:54+0320\n",
      "\n",
      "Mon Aug 28 15:32:54 2023 +0200 Sven Klemm Move log invalidation functions to _timescaledb_functions schema To increase schema security we do not want to mix our own internal objects with user objects. Since chunks are created in the _timescaledb_internal schema our internal functions should live in a different dedicated schema. This patch make the necessary adjustments for the following functions:  - cagg_watermark(integer) - cagg_watermark_materialized(integer) - hypertable_invalidation_log_delete(integer) - invalidation_cagg_log_add_entry(integer,bigint,bigint) - invalidation_hyper_log_add_entry(integer,bigint,bigint) - invalidation_process_cagg_log(integer,integer,regtype,bigint,bigint,integer[],bigint[],bigint[]) - invalidation_process_cagg_log(integer,integer,regtype,bigint,bigint,integer[],bigint[],bigint[],text[]) - invalidation_process_hypertable_log(integer,integer,regtype,integer[],bigint[],bigint[]) - invalidation_process_hypertable_log(integer,integer,regtype,integer[],bigint[],bigint[],text[]) - materialization_invalidation_log_delete(integer)\n"
     ]
    }
   ],
   "source": [
    "vector_store_query = VectorStoreQuery(\n",
    "    query_embedding=query_embedding, similarity_top_k=5\n",
    ")\n",
    "\n",
    "# return most similar vectors to query from end date and a time delta earlier\n",
    "query_result = ts_vector_store.query(\n",
    "    vector_store_query, end_date=end_dt, time_delta=td\n",
    ")\n",
    "\n",
    "for node in query_result.nodes:\n",
    "    print(\"-\" * 80)\n",
    "    print(node.metadata[\"date\"])\n",
    "    print(node.get_content(metadata_mode=\"all\"))"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "e1725d33",
   "metadata": {},
   "source": [
    "The main takeaway is that in each result above, only vectors within the specified time range are returned. These queries are very efficient as they only need to search the relevant partitions."
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "67a6b0f5",
   "metadata": {},
   "source": [
    "## 4. Using TimescaleVector store as a Retriever and Query engine \n",
    "\n",
    "Now that we've explored basic similarity search and similarity search with time-based filters, let's look at how to these features of Timescale Vector with LLamaIndex's retriever and query engine.\n",
    "\n",
    "First we'll look at how to use TimescaleVector as a [retriever](https://gpt-index.readthedocs.io/en/latest/api_reference/query/retrievers.html), specifically a [Vector Store Retriever](https://gpt-index.readthedocs.io/en/latest/api_reference/query/retrievers/vector_store.html).\n",
    "\n",
    "To constrain the nodes retrieved to a relevant time-range, we can use TimescaleVector's time filters. We simply pass the time filter parameters as `vector_strored_kwargs` when creating the retriever."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "613f5a32",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[NodeWithScore(node=TextNode(id_='22747180-31f1-11ee-bd8e-101e36c28c91', embedding=None, metadata={'commit': ' 7aeed663b9c0f337b530fd6cad47704a51a9b2ec', 'author': 'Dmitry Simonenko', 'date': '2023-08-3 14:30:23+0500'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='3273f20a98f02c75847896b929888b05e8751ae5e258d7feb8605bd5290ef8ca', text='Thu Aug 3 14:30:23 2023 +0300 Dmitry Simonenko Feature flags for TimescaleDB features This PR adds several GUCs which allow to enable/disable major timescaledb features:  - enable_hypertable_create - enable_hypertable_compression - enable_cagg_create - enable_policy_create ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.1813839050209377),\n",
       " NodeWithScore(node=TextNode(id_='b5583780-3574-11ee-871a-5a8c45d660c8', embedding=None, metadata={'commit': ' 5bba74a2ec083728f8e93e09d03d102568fd72b5', 'author': 'Fabrízio de Royes Mello', 'date': '2023-08-7 19:49:47+-500'}, excluded_embed_metadata_keys=[], excluded_llm_metadata_keys=[], relationships={}, hash='ec25a09b9dd34ed2650aefc2ce71e1b11fa471ffc43683715de788d202c6cdc8', text='Mon Aug 7 19:49:47 2023 -0300 Fabrízio de Royes Mello Relax strong table lock when refreshing a CAGG When refreshing a Continuous Aggregate we take a table lock on _timescaledb_catalog.continuous_aggs_invalidation_threshold when processing the invalidation logs (the first transaction of the refresh Continuous Aggregate procedure). It means that even two different Continuous Aggregates over two different hypertables will wait each other in the first phase of the refreshing procedure. Also it lead to problems when a pg_dump is running because it take an AccessShareLock on tables so Continuous Aggregate refresh execution will wait until the pg_dump finish.  Improved it by relaxing the strong table-level lock to a row-level lock so now the Continuous Aggregate refresh procedure can be executed in multiple sessions with less locks.  Fix #3554 ', start_char_idx=None, end_char_idx=None, text_template='{metadata_str}\\n\\n{content}', metadata_template='{key}: {value}', metadata_seperator='\\n'), score=0.23511557892997959)]"
      ]
     },
     "execution_count": null,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "from llama_index import VectorStoreIndex\n",
    "from llama_index.storage import StorageContext\n",
    "\n",
    "index = VectorStoreIndex.from_vector_store(ts_vector_store)\n",
    "retriever = index.as_retriever(\n",
    "    vector_store_kwargs=({\"start_date\": start_dt, \"time_delta\": td})\n",
    ")\n",
    "retriever.retrieve(\"What's new with TimescaleDB functions?\")"
   ]
  },
  {
   "attachments": {},
   "cell_type": "markdown",
   "id": "201a517f",
   "metadata": {},
   "source": [
    "Next we'll look at how to use TimescaleVector as a [query engine](https://gpt-index.readthedocs.io/en/latest/api_reference/query/query_engines.html).\n",
    "\n",
    "Once again, we use TimescaleVector's time filters to constrain the search to a relevant time range by passing our time filter parameters as `vector_strored_kwargs` when creating the query engine."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "8d441771",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "TimescaleDB functions have undergone changes recently. These changes were made by Sven Klemm on August 29, 2023. The changes involve adding a compatibility layer for _timescaledb_internal functions. This layer ensures that external callers of these internal functions will not break and allows for more flexibility when migrating.\n"
     ]
    }
   ],
   "source": [
    "index = VectorStoreIndex.from_vector_store(ts_vector_store)\n",
    "query_engine = index.as_query_engine(\n",
    "    vector_store_kwargs=({\"start_date\": start_dt, \"end_date\": end_dt})\n",
    ")\n",
    "\n",
    "# query_str = \"What's new with TimescaleDB? List 3 new features\"\n",
    "query_str = (\n",
    "    \"What's new with TimescaleDB functions? When were these changes made and\"\n",
    "    \" by whom?\"\n",
    ")\n",
    "response = query_engine.query(query_str)\n",
    "print(str(response))"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
